A Synchronous Mode MPI Implementation on the Cell BETM Architecture
نویسندگان
چکیده
The Cell Broadband Engine is a new heterogeneous multi-core processor from IBM, Sony, and Toshiba. It contains eight co-processors, called SPEs, which operate directly on distinct 256 KB local stores, and also have access to a shared 512 MB to 2 GB main memory. The combined peak speed of the SPEs is 204.8 Gflop/s in single precision and 14.64 Gflop/s in double precision. There is, therefore, much interest in using the Cell for high performance computing applications. However, the unconventional architecture of the SPEs, in particular their local store, create some programming challenges. We describe our implementation of certain core features of MPI, which can help meet these challenges, by enabling a large class of MPI applications to be ported to the Cell. This implementation views each SPE as a node for an MPI process, with the local store abstracted in the library, and thus hidden from the application with respect to MPI calls. We further present experimental results on the Cell hardware, where it demonstrates good performance, such as throughput up to 6.01 GB/s and latency as low as 0.65 μs on the pingpong test. The significance of this work lies in (i) demonstrating that the Cell has good potential for running intra-Cell MPI applications, and (ii) enabling such applications to be ported to the Cell with minimal effort.
منابع مشابه
A Buffered-Mode MPI Implementation for the Cell BETM Processor
The Cell Broadband EngineTM is a heterogeneous multi-core architecture developed by IBM, Sony and Toshiba. It has eight computation intensive cores (SPEs) with a small local memory, and a single PowerPC core. The SPEs have a total peak single precision performance of 204.8 Gflops/s, and 14.64 Gflops/s in double precision. Therefore, the Cell has a good potential for high performance computing. ...
متن کاملDesign and Implementaion of Interior Permanent Magnet Synchronous Motor (IPMSM) Control based on Integral Terminal Sliding Mode Technique
Permanent Magnet Synchronous Motor because of high energy storage capability is very important in electrical drive industry. Speed control of this motor suffers from parameter variations such as variable inductance. In this paper, The Integral-Terminal Sliding Mode Control (ITSMC) method is used to control the speed (torque) along with d-axis current control. This method is like to classic slid...
متن کاملPlasma Simulation on Networks of Workstations using the Bulk-Synchronous Parallel Modely
Computationally intensive applications with frequent communication and synchronization require careful design for eecient execution on networks of workstations. We describe a Bulk-Synchronous Processing (BSP) model implementation of a plasma simulation and use of BSP analysis techniques for tuning the program for arbitrary architectures. In addition, we compare the performance of the BSP implem...
متن کاملxBSP: An Efficient BSP Implementation for clan
Virtual Interface Architecture(VIA) is a light-weight protocol for protected user-level zero-copy communication. In spite of high performance of VIA, the previous MPI implementation for GigaNet’s cLAN revealed low communication performance. The main sources of the low performance are the discrepancy of communication model between MPI and VIA and multi-threading overhead. In this paper, we propo...
متن کاملArchitecture independent parallel binomial tree option price valuations
We introduce an architecture independent approach in describing how computations such as those involved in American or European-style option valuations can be performed in parallel in the binomial-tree model. In particular we present an algorithm for the multiplicative binomial tree option-pricing model that can also be directly generalized to the general additive binomial tree model. The algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007